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Abstract 

In this paper, we study the use of “sign a-stable random projections” (where 0 < a < 2) for building ba¬ 
sic data processing tools in the context of large-scale machine learning applications (e.g., classification, 
regression, clustering, and near-neighbor search). After the processing by sign stable random projec¬ 
tions, the inner products of the processed data approximate various types of nonlinear kernels depending 
on the value of a. Thus, this approach provides an effective strategy for approximating nonlinear learn¬ 
ing algorithms essentially at the cost of linear learning. When a = 2, it is known that the corresponding 
nonlinear kernel is the arc-cosine kernel. When a = 1, the procedure approximates the arc-cos-x^ kernel 
(under certain condition). When a O-f, it corresponds to the resemblance kernel, which provides the 
exciting connection between two popular randomized algorithms; (i) stable random projections (ii) 6-bit 
minwise hashing. No theoretical results are known so far for other a values except for a = 2, 1, or O-f. 

From practitioners’ perspective, the method of sign a-stable random projections is ready to be tested 
for large-scale learning applications, where a can be simply viewed as a tuning parameter. What is 
missing in the literature is an extensive empirical study to show the effectiveness of sign stable random 
projections, especially for a ^ 2 or 1. The paper supplies such a study on a wide variety of classi¬ 
fication datasets. In particular, we compare shoulder-by-shoulder sign stable random projections with 
the recently proposed “0-bit consistent weighted sampling (CWS)” ifT^ (which is only for nonnega¬ 
tive data). We provide the detailed comparisons on all the 34 datasets used by ifT^ . In addition, we 
present the comparison on a larger dataset with 350,000 examples. For all datasets, we experiment with 
a £ {0.1,0.25,0.5,0.75,1,1.25,1.5,1.75, 2}. For most datasets, sign stable random projections can 
approach (or in some cases even slightly exceed) the performance of 0-bit CWS, given enough projec¬ 
tions. Typically, to reach the same accuracy, sign stable random projections would require significantly 
more projections than the number of samples needed by 0-bit CWS. There are also datasets for which 
sign stable random projections could not achieve the same accuracy as 0-bit CWS regardless of a. 

While the comparison results seem to favor 0-bit consistent weighted sampling (which is only for non¬ 
negative data), the distinct advantage of sign stable random projections is that the method is applicable 
to general data types, not only for nonnegative data. It is also an interesting research problem to combine 
0-bit CWS with sign stable random projections, for example, a strategy similar to “CoRE kernels” in. 
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1 Introduction 


In this paper, we focus on the idea of “sign a-stable random projections” and the applications in machine 
learning with massive (and possibly streaming |[T8l ) data. Consider two data vectors u, v £ from a data 
matrix, the central idea is to multiply them with a random projection matrix {sjj }, i = 1, D, j = 1, k, 
whose entries, are sampled i.i.d. from an a-stable distribution, denoted by 5(a, 1). That is, 

D D 

Xj = '^UiSij, yj = '^ViSij, Sjj~5(a, 1), i.i.d. j = l,2,...,k (1) 

i=l i=\ 

The use of a-stable distributions was studied in the context of estimating frequency moments of data 
streams |l7j|T0l and in the recent work on “one scan 1-bit compressed sensing” |[T3l . Here, we adopt the 
parameterization EOlflQll such that, if s ~ £'(a, d), then the characteristic function is E . 

When a = 2, S{2, d) is equivalent to a Gaussian distribution A^(0, = 2d). When a = 1, 5(1,1) is the 

standard Cauchy distribution. Although in general no closed-form density functions of a-stable distributions 
are available, one can easily sample from an a-stable distribution by (e.g.,) the classical CMS 0 method. 

Stable distributions with a < 2 are also known to be “heavy-tailed” distributions because if s ~ 5(a, 1), 
then unless a = 2, we always have E'dsl^) = ooifA > a. This is probably the reason why stable 
distributions were rarely used in machine learning and data mining applications. 


1.1 Sign Stable Random Projections 

By property of stable distributions, we have X j S kil") and yj ~ 5 (a,Y,iLi J = 

1,2,..., k. Unless a = 2, it might be difficult to imagine how one can make use of these (manually gener¬ 
ated) heavy-tailed data for of machine learning applications. Indeed, we do not directly use the projected 
data. Instead, in this paper, we only utilize the projected data through their signs, i.e., sign{xj) and sign{yj), 
which are well-behaved and can be used for building tools for large-scale machine learning. 

If Xj < 0, we can code Xj as a two-dimensional vector [0 1]. If Xj > 0, then we code it as [1 0]. Then 
we concatenate k such two-dimensional vectors to form a vector of length 2k (with k I’s). We apply the 
same coding scheme to yj (and all the projected data). The signs, sign{xj) and sign{yj), are statistically 
dependent and it is interesting (and in general challenging) to find ouf how fhe signs are relafed. 


When a = 2, fhe relafionship befween sign{xj) and sign{yj) is well-known llhlfflfTSl 

a = 2 : Pr {sign{xj) = sign{yj)) = 1-cos“^ p 2 , P 2 = ^ (2) 

Thus, fhe “collision probabilify” is monofonic in p 2 , which is fhe correlation coefficienl. Allhough cos“^ p 2 
is nonlinear, fhe esfimafor of fhe probabilify, i.e., ^{xj = yj} can be viewed as an inner producl 

once we expand a sign as eilher [0 1] or [1 0]. In olher words, we only need fo pay fhe cosf of linear learning 
lo approximalely Irain a classifier originally based on nonlinear kernels. 

If is nol so sfraighfforward fo calculate the collision probability once a < 2. A recent work ifT^ focused 
on a = 1 and showed that, when Ui > 0,Vi > 0, Yld=i U = 1> we have 


a = 1 : 


Pr {sign{xj) = sign{yj)) 


1 


1 

-cos 

vr 


-1 


P^2, 


D 

Px^ = 

i=l 



^UjVi 
Ui + Vi 


(3) 


Note that the so-called x^-kernel, p^ 2 , is popular in computer vision, for data generated from histograms. 
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When a —)> 0+, lIT^ mentioned in the “future work” that the collision probability is related to the 
“resemblance” when the data are nonnegative: 


a = 0+ : Pr {sign{xj) = sign{yj)) = ^ + 


^ ^ E»=i l{^i > 0 and Vj > 0} 


Y1F=i > 0 or Ui > 0} 


Interestingly, this collision probability is essentially the same as the collision probability of “1-bit minwise 
hashing” ifT^ . 


For other a values, at this moment we can not relate the collision probabilities to any known similarity 
measures. On the other hand, the estimator ^ Ej=i — Vj} (which is an inner product) is of course still 
a valid positive definite kernel for any a. Thus, we can anyway use sign a-stable random projections for 
building large-scale learning algorithms, where a can be viewed as an important tuning parameter. What is 
missing in the literature is an extensive empirical study and our paper supplies such a study. 


1.2 Resemblance, Min-Max Kernel, and 0-Bit Consistent Weighted Sampling (CWS) 


As mentioned above, the collision probability of sign stable random projections at a = 0-|- is related to the 
resemblance R when the data (e.g., u and v) are nonnegative. From the definition 

Eiii l{Mi > 0 and Vi > 0} 


R = Riu, v) = n 

Ei=i > 0 or Uj > 0} 


Ui > 0, Vi > 0 


(5) 


we can see that R only makes sense when the data are sparse (i.e., most entries are zero). When the data are 
fully dense, we have ii = 1 always. This may seriously limit the use of resemblance when the data are not 
sparse. This issue can be largely fixed by the introduction of the min-max kernel which is defined as 

Vi] 


Kmm{u,v)= . 

2^.^^max|ui, Vi\ 


ni > 0, Uj > 0 


The recent work ifT^ also provides a variant, called the “normalized min-max kernel”: 


Knmm[u,v) = —^ 

max|rii, Vi\ 


D 

E 

2=1 


D 

Ui = 1, y^^vj = 1 
2 = 1 


( 6 ) 


(V) 


The resemblance is a popular measure of similarity for binary data and can be sampled efficiently by 
minwise hashing ElO. The min-max kernels can also be sampled using the technique called consistent 
weighted sampling (CWS) ifTTl ISl. Traditionally, each sample of CWS consists of two values, one of which 
is unbounded. The so-called ”0-bit” CWS ifT^ simply discarded the unbounded value to make CWS much 
more convenient for large-scale machine learning tasks. 

Because ifT^ experimented with a large collection of datasets, we hope to compare, shoulder-by-shoulder, 
sign stable random projections with 0-bit CWS, although we should reiterate that 0-bit CWS is only designed 
for nonnegative data and is hence not as general as sign stable random projections. 


2 Experiments 

2.1 Datasets and Summary of Results 

We have experimented all the 34 datasets used in the recent paper for ”0-bit CWS” ifT^ to provide a shoulder- 
by-shoulder comparison. The results are summarized in Table [T] The results show that, given enough 
projections, sign a-stable random projections can often achieve good accuracies (and better than linear). 
The value of a is an important parameter which needs to be individually tuned for each dataset. 
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Table 1: Datasets and classification accuracies (in %). We use all the datasets in the recent work on “0-bit” 
CWS HU- We report the results of linear kernels, min-max kernels ®, normalize min-max kernels (|7]) 
and sign a-stable random projections with a G {0.1,0.25,0.5,0.75,1,1.25,1.5,1.75,2} and k = 8192. 
The values for the linear kernel, min-max kernels, and n-min-max (or n-m-m) kernels are directly quoted 
from HU. For the min-max (and n-m-m) kernels, the accuracies were computed on the original data using 
LIBSVM “pre-computed” kernel functionality and Z 2 -regularized kernel SVM (which has a tuning parameter 
C). The reported test classification accuracies are the best accuracies from a wide range of C values. The 
reported accuracies of sign a-stable random projections (i.e., the last 9 columns) and linear kernels I 2 - 
regularized linear SVM were computed by LIB LINEAR Q. We highlight (in bold) the highest accuracies 
among all methods as well as the highest accuracies of sign a-stable random projections among 9 a values. 


Dataset 

# train 

# test 

linear 

min-max 

n-m-m 

0.1 

0.25 

0.5 

0.75 

1 

1.25 

1.5 

1.75 

2 

CovertypelOk 

10,000 

50,000 

70.9 

80.4 

80.2 

74.5 

76.7 

77.9 

78.3 

78.4 

78.5 

78.4 

78.3 

78.2 

Covertype20k 

20,000 

50,000 

71.1 

83.3 

83.1 

76.5 

78.4 

79.8 

80.3 

80.4 

80.4 

80.7 

80.5 

80.3 

IJCNNSk 

5,000 

91,701 

91.6 

94.4 

95.3 

91.0 

92.8 

93.7 

94.5 

95.2 

94.7 

95.4 

95.3 

95.4 

IJCNNlOk 

10,000 

91,701 

91.6 

95.7 

96.0 

91.2 

93.3 

94.2 

95.4 

95.7 

95.9 

95.7 

95.9 

96.0 

Isolet 

6,238 

1,559 

95.4 

96.4 

96.6 

90.9 

93.7 

94.9 

95.3 

95.7 

95.6 

95.8 

95.8 

95.6 

Letter 

16,000 

4,000 

62.4 

96.2 

95.0 

88.0 

92.2 

94.1 

94.8 

95.3 

95.3 

95.4 

95.6 

95.6 

Letter4k 

4,000 

16,000 

61.2 

91.4 

90.2 

84.9 

88.1 

90.1 

91.1 

91.5 

91.9 

92.1 

92.0 

91.7 

M-Basic 

12,000 

50,000 

90.0 

96.2 

96.0 

95.9 

96.0 

96.0 

95.9 

95.7 

95.5 

95.4 

95.2 

95.0 

M-Image 

12,000 

50,000 

70.7 

80.8 

77.0 

55.6 

64.1 

67.9 

69.9 

70.9 

71.4 

71.9 

72.1 

72.0 

MNISTlOk 

10,000 

60,000 

90.0 

95.7 

95.4 

95.6 

95.7 

95.6 

95.5 

95.3 

95.2 

95.0 

94.8 

94.7 

M-Noisel 

10,000 

4,000 

60.3 

71.4 

68.5 

47.0 

53.2 

56.8 

58.2 

58.9 

59.7 

60.4 

60.4 

60.9 

M-Noise2 

10,000 

4,000 

62.1 

72.4 

70.7 

46.4 

54.6 

57.5 

59.4 

60.6 

61.5 

61.9 

61.5 

61.7 

M-Noise3 

10,000 

4,000 

65.2 

73.6 

71.9 

50.1 

57.1 

60.6 

62.3 

63.1 

64.0 

64.4 

64.7 

64.8 

M-Noise4 

10,000 

4,000 

68.4 

76.1 

75.2 

53.0 

59.2 

62.9 

65.2 

66.0 

66.7 

67.2 

67.5 

67.8 

M-Noise5 

10,000 

4,000 

72.3 

79.0 

78.4 

55.4 

62.4 

66.4 

68.6 

68.9 

70.2 

70.4 

70.7 

71.5 

M-Noise6 

10,000 

4,000 

78.7 

84.2 

84.3 

59.9 

68.4 

72.6 

74.2 

75.5 

76.1 

76.5 

76.6 

77.3 

M-Rand 

12,000 

50,000 

78.9 

84.2 

84.1 

60.2 

69.1 

72.5 

74.2 

75.2 

76.1 

76.5 

76.8 

77.1 

M-Rotate 

12,000 

50,000 

48.0 

84.8 

83.9 

82.6 

83.0 

82.5 

81.6 

80.9 

80.2 

79.5 

78.8 

78.2 

M-Rotimg 

12,000 

50,000 

31.4 

41.0 

38.5 

24.1 

26.8 

29.3 

30.6 

32.0 

32.7 

33.4 

33.7 

34.1 

Optdigits 

3,823 

1,797 

95.3 

97.7 

97.4 

95.7 

96.4 

96.7 

97.3 

97.4 

97.5 

97.8 

97.8 

97.7 

Pendigits 

7,494 

3,498 

87.6 

97.9 

98.0 

96.6 

97.0 

97.5 

97.7 

97.9 

97.9 

98.0 

98.1 

98.1 

Phoneme 

3,340 

1,169 

91.4 

92.5 

92.0 

88.0 

90.4 

91.3 

91.5 

91.7 

91.6 

91.5 

91.9 

91.6 

Protein 

17,766 

6,621 

69.1 

72.4 

70.7 

69.0 

69.9 

70.6 

70.7 

70.5 

70.3 

69.7 

69.4 

68.8 

RCVl 

20,242 

60,000 

96.3 

96.9 

96.9 

94.8 

94.9 

94.9 

94.9 

94.9 

94.8 

94.7 

94.6 

94.4 

Satimage 

4,435 

2,000 

78.5 

90.5 

87.8 

84.3 

86.1 

87.1 

87.1 

87.3 

87.7 

88.0 

87.8 

87.7 

Segment 

1,155 

1,155 

92.6 

98.1 

97.5 

96.1 

97.0 

97.4 

97.2 

97.3 

97.2 

97.2 

96.9 

96.9 

SensIT20k 

20,000 

19,705 

80.5 

86.9 

87.0 

85.5 

86.2 

86.6 

86.7 

86.7 

86.3 

86.0 

85.3 

84.7 

Shuttlelk 

1,000 

14,500 

90.9 

99.7 

99.6 

99.2 

99.2 

99.4 

99.6 

99.5 

99.6 

99.5 

99.6 

99.6 

Spam 

3,065 

1,536 

92.6 

95.0 

94.7 

95.0 

95.0 

94.9 

94.7 

94.7 

94.4 

94.4 

94.2 

94.0 

Splice 

1,000 

2,175 

85.1 

95.2 

94.9 

87.4 

90.7 

91.7 

91.6 

91.0 

90.7 

89.6 

88.9 

87.3 

USPS 

7,291 

2,007 

91.7 

95.3 

95.3 

94.6 

95.3 

95.5 

95.4 

95.3 

95.3 

95.1 

95.1 

95.1 

Vowel 

528 

462 

40.9 

59.1 

53.5 

41.2 

41.3 

43.8 

46.1 

47.2 

49.3 

51.2 

52.7 

52.9 

WebspamN 1 -20k 

20,000 

60,000 

93.0 

97.9 

97.8 

96.9 

97.3 

97.5 

97.5 

97.5 

97.4 

97.3 

97.2 

97.0 

Youtube Vision 

11,736 

10,000 

63.3 

72.4 

72.4 

59.7 

65.0 

68.4 

69.4 

69.2 

68.9 

67.9 

66.2 

64.8 
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2.2 Detailed Results of Sign a-Stable Random Projections 

Figures [T] to |4] presents the detailed classification results of sign a-stable random projections for selected 
4 datasets, using f 2 -regularized linear SVM (with a regularization parameter C € [10“^, 10^]). In each 
figure, we presenf fhe resulfs for k G {64,128,256,512,1024,2048,4096,8192} projections and a G 
(0.1,0.25,0.5,0.75,1,1.25,1.5,1.75,2}. All experimenfs were conducted using LIBLINEAR Q and we 
repeated each randomized experimenf 5 limes and reporfed fhe average resulfs. The classificalion resulfs are 
very slable (i.e., very small variance) unless k is foo small. 

The resulfs (logelher wilh Table [U and ofher figures lafer in fhe paper) show lhal, given enough projec- 
lions (e.g., 8192), fhe melhod of sign a-slable random projections can lypically achieve good accuracies. 




Figure 1: CovertypelOk. Classificalion accuracies of sign a-slable random projections using / 2 -regulaiized 
SVMs (wilh a luning parameter C G [10“^, 10^]) for a G {0.1,0.25,0.5,0.75,1,1.25,1.5,1.75,2} and 
k G {64,128, 256,512,1024,2048,4096, 8192} projections. In each panel, fhe highesf poinf (i.e., besl 
accuracy) al A; = 8192 was reported in Tabled] In addition, each panel also presenls fhe accuracies of linear 
SVM (fhe pink curve marked by *). All experimenfs were conducled by LIBLINEAR. 
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Figure 2: Letter. Classification accuracies of sign a-stable random projections using Z 2 -regularized SVMs 
(with a tuning parameter C G [10“^, 10^]) for a € {0.1,0.25,0.5,0.75,1,1.25,1.5,1.75, 2} and /c G 
{64,128, 256,512,1024,2048,4096, 8192} projections. In each panel, the highest point (i.e., best accuracy) 
at A: = 8192 was reported in Tabled In addition, each panel also presents the accuracies of linear SVM (the 
pink curve marked by *). All experiments were conducted by LIBLINEAR. 
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Figure 3: MNISTlOk. Classification accuracies of sign a-stable random projections using / 2 -regularized 
SVMs (with a tuning parameter C G [10“^,10^]) for a € {0.1,0.25,0.5,0.75,1,1.25,1.5,1.75,2} and 
k G {64,128,256,512,1024,2048,4096,8192} projections. In each panel, the highest point (i.e., best 
accuracy) at /c = 8192 was reported in Table[T] In addition, each panel also presents the accuracies of linear 
SVM (the pink curve marked by *). All experiments were conducted by LIBLINEAR. 
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Figure 4: Segment. Classification accuracies of sign a-stable random projections using t 2 -regularized 
SVMs (with a tuning parameter C G [10“^, 10^]) for a € {0.1,0.25,0.5,0.75,1,1.25,1.5,1.75,2} and 
k G {64,128,256,512,1024,2048,4096,8192} projections. In each panel, the highest point (i.e., best 
accuracy) at /c = 8192 was reported in Table[T] In addition, each panel also presents the accuracies of linear 
SVM (the pink curve marked by *). All experiments were conducted by LIBLINEAR. 













































2.3 Detailed Comparisons with 0-Bit Consistent Weighted Sampling (CWS) 


Figures [5] to [8] compare sign a-stable random projections with 0-bit CWS ifT^ on selected datasets. For 
clarity, we only show the results of sign stable random projections for k = 128,256,1024, 8192 projections, 
and the results for 0-bit CWS with k = 128, 256,1024 samples. These results demonstrate that 0-bit CWS 
requires much fewer samples, although we should keep in mind that 0-bit CWS is only for nonnegative data. 



c c c 



c 


c 


c 


Figure 5: MNISTlOk (top 2 rows) and M-Rotate (bottom 2 rows). We compare sign a-stable random pro¬ 
jections with 0-bit consistent weighted sampling (CWS). Each panel (for each a) consists of 8 curves. The 
solid (pink) curve marked by * represents the results of linear SVM. Four solid curves (labelled by k = 128, 
k = 256, k = 1024, and k = 8192, respectively) represent the results of sign a-stable random projections 
for 4 different k values. The 3 dashed curves correspond to the results of 0-bit CWS for k = 128, 256,1024 
(a higher curve for a higher k value). These experimental results, all conducted using LIBLINEAR, show 
that 0-bit CWS requires much fewer samples to achieve the sample accuracies. 
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Figure 6: Pendigits and Satimage. We compare sign a-stable random projections with 0-bit consistent 
weighted sampling (CWS). Each panel (for each a) consists of 8 curves. The solid (pink) curve marked by 
* represents the results of linear SVM. Four solid curves (labelled hy k = 128, k = 256, k = 1024, and 
k = 8192, respectively) represent the results of sign a-stable random projections for 4 different k values. 
The 3 dashed curves correspond to the results of 0-bit CWS for k = 128, 256,1024 (a higher curve for a 
higher k value). These experimental results, all conducted using LIBLINEAR, show that 0-bit CWS requires 
much fewer samples to achieve the sample accuracies. 
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Figure 7: Shuttlelk and Splice. We compare sign a-stable random projections with 0-bit consistent 
weighted sampling (CWS). Each panel (for each a) consists of 8 curves. The solid (pink) curve marked 
by * represents the results of linear SVM. Four solid curves (labelled by A; = 128, k = 256, k = 1024, 
and k = 8192, respectively) represent the results of sign a-stable random projections for 4 different k val¬ 
ues. The 3 dashed curves correspond to the results of 0-bit CWS for k = 128, 256,1024 (a higher curve 
for a higher k value). These experimental results, all conducted using LIBLINEAR, show that 0-bit CWS 
requires much fewer samples to achieve the sample accuracies. 
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WebspamN1-20k: a = 1.5 





Figure 8: USPS and WebspamNl-20k. We compare sign a-stable random projections with 0-bit consistent 
weighted sampling (CWS). Each panel (for each a) consists of 8 curves. The solid (pink) curve marked by 
* represents the results of linear SVM. Four solid curves (labelled hy k = 128, k = 256, k = 1024, and 
k = 8192, respectively) represent the results of sign a-stable random projections for 4 different k values. 
The 3 dashed curves correspond to the results of 0-bit CWS for k = 128, 256,1024 (a higher curve for a 
higher k value). These experimental results, all conducted using LIB LINEAR, show that 0-bit CWS requires 
much fewer samples to achieve the sample accuracies. 
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2.4 Experiment on a Larger Dataset 

The paper on 0-bit CWS lIT^ only experimented with datasets of moderate sizes for an important reason. To 
prove the correctness, they need to show that the result of 0-bit CWS with enough samples could approach 
that of exact min-max kernel. A straightforward and faithful implementation of SVM with min-max kernel 
is to use the LIB SVM pre-computed kernel functionality by computing the kernel explicitly and feeding it 
to SVM from outside. This strategy, although most repeatable, is very expensive for datasets which are not 
even large HI. On other hand, once we have proved the correctness of 0-bit CWS, applying the method to 
larger datasets is easy, except that we would not be able to compute the exact result of min-max kernel. 

Figure |9] presents the detailed results on the WebspamNl dataset, which has 350,000 examples. We 
use 50% of the examples for training and the other 50% for testing. With linear SVM, the test classification 
accuracy is about 93%. Both sign a-stable random projections and 0-bit CWS can achieve > 98% accuracies 
given enough samples. The figure also confirm fhaf 0-bif CWS requires significanfly fewer samples fhan fhe 
number of projecfions needed by sign sfable random projecfions, fo achieve comparable accuracies. 






Figure 9: WebspamNl. We compare sign a-sfable random projections wifh O-bif consisfenf weighted 
sampling (CWS). Each panel (for each a) consisfs of 8 curves. The solid (pink) curve marked by * represenfs 
fhe resulfs of linear SVM. Four solid curves (labelled by A: = 128, k = 256, k = 1024, and k = 8192, 
respectively) represenf fhe resulfs of sign a-sfable random projections for 4 differenl k values. The 3 dashed 
curves correspond fo fhe resulfs of 0-bif CWS for k = 128, 256,1024 (a higher curve for a higher k value). 
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3 Conclusion 


This paper provides an extensive empirical study of sign a-stable random projections for large-scale learn¬ 
ing applications. Although the paper focuses on presenting the results on classification tasks, one should 
keep mind that the method is a general-purpose data processing tool which can be used for classification, 
regression, clustering, or near-neighbor search. Given enough projections, the method can often achieve 
good performance. The comparison with 0-bit CWS should be also interesting to practitioners. 

Future work: The processing cost of sign a-stale random projections can be substantially improved by 
“very sparse stable random projections” ||9l|- An empirical study is needed to confirm this claim. Another 
interesting line of research is to combine sign stable random projections with 0-bit CWS, for example, by a 
strategy similar to that in the recent work of “CoRE kernels” ifTTl . 
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